Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-14123] [SPARK-14384] [SQL] Handle CreateFunction/DropFunction #12117

Closed
wants to merge 49 commits into from
Closed

[SPARK-14123] [SPARK-14384] [SQL] Handle CreateFunction/DropFunction #12117

wants to merge 49 commits into from

Conversation

yhuai
Copy link
Contributor

@yhuai yhuai commented Apr 1, 2016

What changes were proposed in this pull request?

This PR implements CreateFunction and DropFunction commands. Besides implementing these two commands, we also change how to manage functions. Here are the main changes.

  • FunctionRegistry will be a container to store all functions builders and it will not actively load any functions. Because of this change, we do not need to maintain a separate registry for HiveContext. So, HiveFunctionRegistry is deleted.
  • SessionCatalog takes care the job of loading a function if this function is not in the FunctionRegistry but its metadata is stored in the external catalog. For this case, SessionCatalog will (1) load the metadata from the external catalog, (2) load all needed resources (i.e. jars and files), (3) create a function builder based on the function definition, (4) register the function builder in the FunctionRegistry.
  • A UnresolvedGenerator is created. So, the parser will not need to call FunctionRegistry directly during parsing, which is not a good time to create a Hive UDTF. In the analysis phase, we will resolve UnresolvedGenerator.

This PR is based on @viirya's #12036

How was this patch tested?

Existing tests and new tests.

TODOs

[x] Self-review
[x] Cleanup
[x] More tests for create/drop functions (we need to more tests for permanent functions).
[ ] File JIRAs for all TODOs
[x] Standardize the error message when a function does not exist.

viirya and others added 26 commits March 27, 2016 13:15
Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/execution/SparkQl.scala
@SparkQA
Copy link

SparkQA commented Apr 1, 2016

Test build #54723 has finished for PR 12117 at commit e0570cd.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 1, 2016

Test build #54724 has finished for PR 12117 at commit 0a1866c.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* alias: the class name that implements the created function.
* resources: Jars, files, or archives which need to be added to the environment when the function
* is referenced for the first time by a session.
* isTemp: indicates if it is a temporary function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you follow the javadoc format of the other commands, something like:

  * ...
  * The syntax of using this command in SQL is:
  * {{{
  *    SHOW FUNCTIONS [LIKE pattern]
  * }}}

The command format is pretty useful. Same in DropFunction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@andrewor14
Copy link
Contributor

Thanks @yhuai. This looks pretty good. I left a few suggestions on how the documentation, code readability and placement of functionality can be improved, but these are all relatively minor. Once you address the comments I will go ahead and merge this.

Also thanks @viirya for the original changes.

@viirya
Copy link
Member

viirya commented Apr 5, 2016

Thanks @yhuai for improving the original changes. This looks great now.

@@ -208,6 +261,118 @@ class HiveSparkSubmitSuite
}
}

// This application is used to test defining a new Hive UDF (with an associated jar)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea. This is good!

@@ -201,8 +201,13 @@ class TestHiveContext private[hive](
}

override lazy val functionRegistry = {
new TestHiveFunctionRegistry(
org.apache.spark.sql.catalyst.analysis.FunctionRegistry.builtin.copy(), self.executionHive)
// TestHiveFunctionRegistry tracks removed functions. So, we cannot simply use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I do not fully understand this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@SparkQA
Copy link

SparkQA commented Apr 5, 2016

Test build #54931 has finished for PR 12117 at commit 88fd93c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 5, 2016

Test build #54945 has finished for PR 12117 at commit 8a41f6d.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 5, 2016

Test build #54948 has finished for PR 12117 at commit 361421c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Apr 5, 2016

LGTM now.

@SparkQA
Copy link

SparkQA commented Apr 5, 2016

Test build #54987 has finished for PR 12117 at commit 3938766.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
if (!ifExists && !catalog.functionExists(func)) {
throw new AnalysisException(
s"Function '$functionName' does not exist in database '$dbName'.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this should be handled within dropFunction itself (by passing the ifExists flag), but not a big deal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yes. I totally agree. We should do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewor14
Copy link
Contributor

Thanks @yhuai @viirya I'm merging this into master.

@asfgit asfgit closed this in 72544d6 Apr 5, 2016
@yhuai yhuai deleted the function branch April 5, 2016 20:52
asfgit pushed a commit that referenced this pull request Apr 7, 2016
## What changes were proposed in this pull request?

This is a followup to #12117 and addresses some of the TODOs introduced there. In particular, the resolution of database is now pushed into session catalog, which knows about the current database. Further, the logic for checking whether a function exists is pushed into the external catalog.

No change in functionality is expected.

## How was this patch tested?

`SessionCatalogSuite`, `DDLSuite`

Author: Andrew Or <andrew@databricks.com>

Closes #12198 from andrewor14/function-exists.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants